{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Euclidean Distance \n",
    "\n",
    "Welcome to your 1-st assignment. By working through this exercise you will learn how to\n",
    "\n",
    "- do this\n",
    "- understand this\n",
    "- learn that\n",
    "\n",
    "**Instructions:**\n",
    "- You will be using Python 3.\n",
    "- Avoid using for-loops and while-loops, unless you are explicitly told to do so.\n",
    "- Do not modify the (# GRADED FUNCTION [function name]) comment in some cells. Your work would not be graded if you change this. Each cell containing that comment should only contain one function.\n",
    "- After coding your function, run the cell right below it to check if your result is correct.\n",
    "\n",
    "**After this assignment you will:**\n",
    "- know how to do this\n",
    "- understand so and so\n",
    "\n",
    "Let's get started!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Dataset\n",
    "Suppose we have a $n$ dimensional space $\\mathbb{R}^{n}$, we want to generate $1000000$ pairs of uniformly distributed random\n",
    "numbers $X\\sim\\mathscr{U}\\left(-1,\\:1\\right)$. \n",
    "\n",
    "For instance, if $n=1$, we generate $p_{1}=\\left(x_{1},\\:y_{1}\\right)$, $p_{2}=\\left(x_{2},\\:y_{2}\\right)$, $\\cdots$, $p_{1000000}=\\left(x_{1000000},\\:y_{1000000}\\right)$, where $x_{1}$, $x_{2}$, $\\cdots$, $x_{1000000}$ are uniformly distributed, $y_{1}$, $y_{2}$, $\\cdots$, $y_{1000000}$ are uniformly distributed too. \n",
    "\n",
    "If $n=2$, we generate $\\mathbf{p}_{1}=\\left(\\mathbf{x}_{1},\\:\\mathbf{y}_{1}\\right)$, where $\\mathbf{x}_{1}=\\left(x_{1}^{\\left(1\\right)},\\:x_{1}^{\\left(2\\right)}\\right)$ and $\\mathbf{y}_{1}=\\left(y_{1}^{\\left(1\\right)},\\:y_{1}^{\\left(2\\right)}\\right)$, $\\mathbf{p}_{2}=\\left(\\mathbf{x}_{2},\\:\\mathbf{y}_{2}\\right)$, where $\\mathbf{x}_{2}=\\left(x_{2}^{\\left(1\\right)},\\:x_{2}^{\\left(2\\right)}\\right)$ and $\\mathbf{y}_{2}=\\left(y_{2}^{\\left(1\\right)},\\:y_{2}^{\\left(2\\right)}\\right)$, $\\cdots$, $\\mathbf{p}_{1000000}=\\left(\\mathbf{x}_{1000000},\\:\\mathbf{y}_{1000000}\\right)$, where $\\mathbf{x}_{1000000}=\\left(x_{1000000}^{\\left(1\\right)},\\:x_{1000000}^{\\left(2\\right)}\\right)$ and $\\mathbf{y}_{1000000}=\\left(y_{1000000}^{\\left(1\\right)},\\:y_{1000000}^{\\left(2\\right)}\\right)$, and $x_{1}^{\\left(1\\right)}$, $x_{2}^{\\left(1\\right)}$, $\\cdots$, $x_{1000000}^{\\left(1\\right)}$ are uniformly distributed, $x_{1}^{\\left(2\\right)}$, $x_{2}^{\\left(2\\right)}$, $\\cdots$, $x_{1000000}^{\\left(2\\right)}$ are uniformly distributed, $y_{1}^{\\left(1\\right)}$, $y_{2}^{\\left(1\\right)}$, $\\cdots$, $y_{1000000}^{\\left(1\\right)}$ are uniformly distributed, and $y_{1}^{\\left(2\\right)}$, $y_{2}^{\\left(2\\right)}$, $\\cdots$, $y_{1000000}^{\\left(2\\right)}$ are uniformly distributed too. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# imports \n",
    "import numpy as np\n",
    "# import matplotlib.pyplot as plt \n",
    "# %matplotlib inline\n",
    "\n",
    "from sklearn.metrics.pairwise import euclidean_distances\n",
    "\n",
    "import sys\n",
    "sys.path.append(\"..\")\n",
    "import grading\n",
    "\n",
    "import timeit\n",
    "import matplotlib.mlab\n",
    "import scipy.stats\n",
    "from scipy.stats import norm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "### ONLY FOR GRADING. DO NOT EDIT ###\n",
    "submissions=dict()\n",
    "assignment_key=\"2RRok_GPEeeQZgq5AVms2g\" \n",
    "all_parts=[\"pmqxU\", \"VrXL6\", \"XsLp1\",\"jD7SY\",\"Ad4J0\",\"1nPFm\"]\n",
    "### ONLY FOR GRADING. DO NOT EDIT ###"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "COURSERA_TOKEN = \" \"# the key provided to the Student under his/her email on submission page\n",
    "COURSERA_EMAIL = \" \"# the email"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def euclidean_distances_stats(euclidean_distances_vector):\n",
    "    \"\"\"\n",
    "    Calculate Euclidean distances statistics\n",
    "    \n",
    "    Arguments:\n",
    "    euclidean_distances_vector - 1-D vector of Euclidean distances\n",
    "    \n",
    "    Return:\n",
    "        np.array() of length 4\n",
    "        the first element of array is the mean\n",
    "        the second element is variance\n",
    "        the third element is skew of the distribution\n",
    "        the forth element is kurtusis of the distribution\n",
    "    \"\"\"\n",
    "    if len(euclidean_distances_vector) > 0:\n",
    "        this_mean = np.mean( euclidean_distances_vector )\n",
    "        this_variance = np.var( euclidean_distances_vector )\n",
    "        this_skewness = scipy.stats.skew( euclidean_distances_vector )    \n",
    "        this_kurtosis = scipy.stats.kurtosis( euclidean_distances_vector )\n",
    "        result = np.array([this_mean, this_variance, this_skewness, this_kurtosis])\n",
    "    else:\n",
    "        result = np.array([0.] * 4)\n",
    "    return result\n",
    "\n",
    "\n",
    "def print_stats(euclidean_stats):\n",
    "    \"\"\"\n",
    "    Print Euclidean distances statistics\n",
    "    \n",
    "    Arguments: \n",
    "    euclidean_stats - np.array() of length 4\n",
    "        the first element of array is the mean\n",
    "        the second element is variance\n",
    "        the third element is skew of the distribution\n",
    "        the forth element is kurtusis of the distribution\n",
    "    \"\"\"\n",
    "    this_mean = euclidean_stats[0]\n",
    "    this_variance = euclidean_stats[1]\n",
    "    this_skewness = euclidean_stats[2]\n",
    "    this_kurtosis = euclidean_stats[3]\n",
    "    print( 'Expectation of Euclidean distances: ', this_mean, '\\n' )\n",
    "    print( 'Variance of Euclidean distances: ', this_variance, '\\n' )\n",
    "    print( 'Skewness of Euclidean distances: ', this_skewness, '\\n' )\n",
    "    print( 'Kurtosis of Euclidean distances: ',this_kurtosis, '\\n' )\n",
    "\n",
    "\n",
    "def plot_distribution(euclidean_distances_vector, euclidean_stats, dim_space, bins_number=30):\n",
    "    \"\"\"\n",
    "    Plot histogram of Euclidean distances against normal distribution PDF\n",
    "    \n",
    "    Arguments: \n",
    "    \n",
    "    euclidean_distances_vector - 1-D vector of Euclidean distances\n",
    "    \n",
    "    euclidean_stats - np.array() of length 4\n",
    "        the first element of array is the mean\n",
    "        the second element is variance\n",
    "        the third element is skew of the distribution\n",
    "        the forth element is kurtusis of the distribution\n",
    "    \n",
    "    dim_space - dimension of the space\n",
    "    bins_number - number of bins in the histogram\n",
    "    \"\"\"\n",
    "    # verbose, but this is for clarity\n",
    "    this_mean = euclidean_stats[0]\n",
    "    this_variance = euclidean_stats[1]\n",
    "    this_skewness = euclidean_stats[2]\n",
    "    this_kurtosis = euclidean_stats[3]\n",
    "    \n",
    "    sample_size = len(euclidean_distances_vector)\n",
    "    try:\n",
    "        fig_l, ax_l = plt.subplots()\n",
    "        n_bins_l, bins_l, patches_l = ax_l.hist( euclidean_distances_vector, bins_number, normed=1 )  \n",
    "        y_l = matplotlib.mlab.normpdf( bins_l, this_mean, np.sqrt( this_variance ) )\n",
    "        ax_l.plot( bins_l, y_l, 'r--' )\n",
    "        plt.title( 'Histogram for dimension = %d and sample size = %d \\n $\\mu$ = %.3f, $\\sigma^2$ = %.3f, Skewness = %.3f, Kurtosis = %.3f' \\\n",
    "                                           % (dim_space, sample_size, this_mean, this_variance, this_skewness, this_kurtosis ) )\n",
    "        fig_l.tight_layout()\n",
    "        plt.grid( True, which='both')\n",
    "        plt.minorticks_on()\n",
    "        return fig_l\n",
    "    except:\n",
    "        return None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X:  [[ 0.09220363  0.85065196  0.90075012  0.59361319  0.84875299]\n",
      " [ 0.13300259  0.50209599  0.76796562  0.92047036  0.47544869]\n",
      " [ 0.72927521  0.8054414   0.4002669   0.01355402  0.31719426]\n",
      " ..., \n",
      " [ 0.82071112  0.46084335  0.92036074  0.31746465  0.03535725]\n",
      " [ 0.21581585  0.12317179  0.42738517  0.35466096  0.93360429]\n",
      " [ 0.84577044  0.67545711  0.22706133  0.58893715  0.98216918]]\n",
      "Y:  [[ 0.32900813  0.34963352  0.52804383  0.38208285  0.03237214]\n",
      " [ 0.11760546  0.46402303  0.12260294  0.18876132  0.99071561]\n",
      " [ 0.49587495  0.18125864  0.61421199  0.29089588  0.71308158]\n",
      " ..., \n",
      " [ 0.14440936  0.38925149  0.50634999  0.29421895  0.96282509]\n",
      " [ 0.15239208  0.4741476   0.84900715  0.70515312  0.22175127]\n",
      " [ 0.46490389  0.50546926  0.04574762  0.75900819  0.25636212]]\n"
     ]
    }
   ],
   "source": [
    "lower_boundary = 0\n",
    "upper_boundary = 1\n",
    "n = 5 # dimension\n",
    "sample_size = 10000\n",
    "\n",
    "np.random.seed(9001) # set the seed to yield reproducible results\n",
    "\n",
    "X = np.random.uniform( low=lower_boundary, high=upper_boundary, size=(sample_size, n) )\n",
    "Y = np.random.uniform( low=lower_boundary, high=upper_boundary, size=(sample_size, n) )\n",
    "\n",
    "print( 'X: ', X )\n",
    "print( 'Y: ', Y )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 1\n",
    "Calculate the Euclidean distance between the two points of each pair. Do this in a loop. Hint: use sklearn to do the computation.\n",
    "\n",
    "Plot the histogram of the Euclidean distance. In a $n$ dimensional space $\\mathbb{R}^{n}$, the Euclidean distance between $\\mathbf{x}=\\left(x_{1},\\:x_{2},\\:\\cdots,\\:x_{n}\\right)$ and $\\mathbf{y}=\\left(y_{1},\\:y_{2},\\:\\cdots,\\:y_{n}\\right)$ is given\n",
    "by \n",
    "\\begin{equation}\n",
    "\\begin{aligned}d_{E}\\left(\\mathbf{p},\\:\\mathbf{q}\\right) & =\\sqrt{\\left(x_{1}-y_{1}\\right)^{2}+\\left(x_{2}-y_{2}\\right)^{2}+\\cdots+\\left(x_{n}-y_{n}\\right)^{2}}\\\\\n",
    " & =\\sqrt{\\sum_{i=1}^{n}\\left(x_{i}-y_{i}\\right)^{2}}\\\\\n",
    " & =\\left\\Vert \\mathbf{x}-\\mathbf{y}\\right\\Vert _{2}\n",
    "\\end{aligned}\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running time:  2.663071423768997\n"
     ]
    }
   ],
   "source": [
    "start = timeit.default_timer()\n",
    "### START CODE HERE ### (≈ 4 lines of code)\n",
    "# implement a loop which computes Euclidean distances between each element in X and Y\n",
    "# store results in euclidean_distances_vector_l list\n",
    "euclidean_distances_vector_l = []\n",
    "# implement a loop which computes Euclidean distances between each element in X and Y\n",
    "# store results in euclidean_distances_vector_l list\n",
    "for index, x in enumerate(X):\n",
    "    euclidean_distances_vector_l.append(euclidean_distances(x.reshape(1, -1), Y[index].reshape(1, -1)))\n",
    "### END CODE HERE ###\n",
    "stop = timeit.default_timer()\n",
    "print( 'Running time: ', stop-start )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Submission successful, please check on the coursera grader page for the status\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([ 0.87662633,  0.06098537, -0.03504537, -0.26237711])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filename: SklearnDistance, PART: pmqxU\n",
    "### GRADED PART (DO NOT EDIT) ###\n",
    "result = euclidean_distances_stats(euclidean_distances_vector_l)\n",
    "part_1 = list(result.squeeze())\n",
    "try:\n",
    "    part1 = \" \".join(map(repr, part_1))\n",
    "except TypeError:\n",
    "    part1 = repr(part_1)\n",
    "submissions[all_parts[0]]=part1\n",
    "grading.submit(COURSERA_EMAIL, COURSERA_TOKEN, assignment_key,all_parts[:1],all_parts,submissions)\n",
    "result\n",
    "### GRADED PART (DO NOT EDIT) ###"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Expectation of Euclidean distances:  0.876626326649 \n",
      "\n",
      "Variance of Euclidean distances:  0.0609853651691 \n",
      "\n",
      "Skewness of Euclidean distances:  -0.0350453681886 \n",
      "\n",
      "Kurtosis of Euclidean distances:  -0.262377106269 \n",
      "\n"
     ]
    }
   ],
   "source": [
    "print_stats(result)\n",
    "plot_distribution(euclidean_distances_vector_l, result, n)\n",
    "try:\n",
    "    plt.show()\n",
    "except: pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2\n",
    "Calculate the Euclidean distance between the two points of each pair using vectorized operations and inner product."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running time:  0.0016185259446501732\n"
     ]
    }
   ],
   "source": [
    "# using vectorization by calculating inner product\n",
    "start = timeit.default_timer()\n",
    "# variables needed for grading\n",
    "\n",
    "### START CODE HERE ### (≈ 3 lines of code)\n",
    "# compute Euclidean distances between each element in X and Y using (vectorized implementation)\n",
    "# store results in euclidean_distances_vector_v \n",
    "euclidean_distances_vector_l_vectorized = np.sqrt(np.sum((X - Y) * (X - Y), axis=1))\n",
    "### END CODE HERE ###\n",
    "stop = timeit.default_timer()\n",
    "print( 'Running time: ', stop-start )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Submission successful, please check on the coursera grader page for the status\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array([ 0.87662633,  0.06098537, -0.03504537, -0.26237711])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filename: VectorizedDistance, PART: VrXL6\n",
    "### GRADED PART (DO NOT EDIT) ### \n",
    "result = euclidean_distances_stats(euclidean_distances_vector_l_vectorized)\n",
    "part_2 = result.squeeze()\n",
    "try:\n",
    "    part2 = \" \".join(map(repr, part_2))\n",
    "except TypeError:\n",
    "    part2 = repr(part_2)\n",
    "submissions[all_parts[1]]=part2\n",
    "grading.submit(COURSERA_EMAIL, COURSERA_TOKEN, assignment_key,all_parts[:2],all_parts,submissions)\n",
    "result\n",
    "### GRADED PART (DO NOT EDIT) ###"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Expectation of Euclidean distances:  0.876626326649 \n",
      "\n",
      "Variance of Euclidean distances:  0.0609853651691 \n",
      "\n",
      "Skewness of Euclidean distances:  -0.0350453681886 \n",
      "\n",
      "Kurtosis of Euclidean distances:  -0.262377106269 \n",
      "\n"
     ]
    }
   ],
   "source": [
    "print_stats(result)\n",
    "fig = plot_distribution(euclidean_distances_vector_l_vectorized, result, n)\n",
    "try:\n",
    "    plt.plot()\n",
    "except: pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Question 3 \n",
    "We repeat question 1 and question 2 for $n=1$, $n=5$, $n=10$, $n=100$, $n=1000$, $n=5000$, and $n=10000$. Then plot the expectation and variance as a function of $n$.\n",
    "You need to generate two sets of n-dimensional samples, compute "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def VectorizationMethod(dim_space, sample_size, lower_boundary, upper_boundary, bins_number=30):\n",
    "    \"\"\"\n",
    "    Generate sample_size elements from dim_space-dimensional space. The coordinates of each element in the space\n",
    "    are sampled from uniform distribution between lower_boundary and upper_boundary\n",
    "    \n",
    "    Arguments: \n",
    "    \n",
    "    dim_space - dimension of the space, a positive integer\n",
    "    sample_size - number of samples in the dim_space-dimensional space\n",
    "    lower_boundary - lower boundary of coordinates sampled from U(lower_boundary, upper_boundary)\n",
    "    upper_boundary - lower boundary of coordinates sampled from U(lower_boundary, upper_boundary)\n",
    "    bins_number - number of bins to plot a histogram\n",
    "    \n",
    "    stats_result - np.array() of length 4\n",
    "        the first element of array is the mean\n",
    "        the second element is variance\n",
    "        the third element is skew of the distribution\n",
    "        the forth element is kurtusis of the distribution\n",
    "    \"\"\"\n",
    "    np.random.seed(42)\n",
    "    # variables needed for grading\n",
    "#     euclidean_distances_vector_v = []\n",
    "    ### START CODE HERE ### (≈ 7-10 lines of code)\n",
    "    # store results in euclidean_distances_vector_v\n",
    "    euclidean_distances_vector_v = np.array([])\n",
    "    X = np.random.uniform(low=lower_boundary, high=upper_boundary, size=(sample_size, dim_space))\n",
    "    Y = np.random.uniform(low=lower_boundary, high=upper_boundary, size=(sample_size, dim_space))\n",
    "    euclidean_distances_vector_v = np.sqrt(np.sum((X - Y) * (X - Y), axis=1))\n",
    "   \n",
    "    stats_result = euclidean_distances_stats(euclidean_distances_vector_v)\n",
    "    fig = plot_distribution(euclidean_distances_vector_v, stats_result, dim_space)\n",
    "    ### END CODE HERE ###\n",
    "    stats_result = euclidean_distances_stats(euclidean_distances_vector_v)\n",
    "    return tuple(stats_result.tolist())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Calculating finished for sample size = 10000, dimension = 2\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 5\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 10\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 20\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 40\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 60\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 80\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 100\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 200\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 400\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 600\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 800\n",
      "\n",
      "Calculating finished for sample size = 10000, dimension = 1000\n",
      "\n",
      "Running time:  3.0821857806295156\n"
     ]
    }
   ],
   "source": [
    "start = timeit.default_timer()\n",
    "\n",
    "sample_size = 10000\n",
    "lower_boundary = 0\n",
    "upper_boundary = 1\n",
    "dimension_vector = [2, 5, 10, 20, 40, 60, 80, 100, 200, 400, 600, 800, 1000] \n",
    "n_dims = len(dimension_vector)\n",
    "\n",
    "euclidean_distances_mean_vector = [np.nan] * n_dims\n",
    "euclidean_distances_variance_vector = [np.nan] * n_dims\n",
    "euclidean_distances_skewness_vector = [np.nan] * n_dims\n",
    "euclidean_distances_kurtosis_vector = [np.nan] * n_dims\n",
    "\n",
    "for idx, space_dims in enumerate(dimension_vector):\n",
    "    \n",
    "    # using vectorization\n",
    "    euclidean_distances_mean, euclidean_distances_variance, euclidean_distances_skewness, euclidean_distances_kurtosis = \\\n",
    "                 VectorizationMethod( space_dims, sample_size, lower_boundary, upper_boundary )\n",
    "        \n",
    "    euclidean_distances_mean_vector[idx] = euclidean_distances_mean\n",
    "    euclidean_distances_variance_vector[idx] = euclidean_distances_variance\n",
    "    euclidean_distances_skewness_vector[idx] = euclidean_distances_skewness\n",
    "    euclidean_distances_kurtosis_vector[idx] = euclidean_distances_kurtosis\n",
    "    \n",
    "    print( 'Calculating finished for sample size = %d, dimension = %d\\n' %( sample_size, space_dims) )\n",
    "\n",
    "stop = timeit.default_timer()\n",
    "print( 'Running time: ', stop-start )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Submission successful, please check on the coursera grader page for the status\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[0.5244117684024786,\n",
       " 0.8822841161864812,\n",
       " 1.2676717606162842,\n",
       " 1.8110504380007288,\n",
       " 2.5684460728327534,\n",
       " 3.1487610877583165,\n",
       " 3.64396853019095,\n",
       " 4.073344650824303,\n",
       " 5.768449828048197,\n",
       " 8.160150803731382,\n",
       " 9.997217189326257,\n",
       " 11.543203181243685,\n",
       " 12.906928018524363]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filename : DistancesMean, PART: XsLp1\n",
    "### GRADED PART (DO NOT EDIT) ###\n",
    "part_3 = list(euclidean_distances_mean_vector)\n",
    "try:\n",
    "    part3 = \" \".join(map(repr, part_3))\n",
    "except TypeError:\n",
    "    part3 = repr(part_3)\n",
    "submissions[all_parts[2]]=part3\n",
    "grading.submit(COURSERA_EMAIL, COURSERA_TOKEN, assignment_key,all_parts[:3],all_parts,submissions)\n",
    "euclidean_distances_mean_vector\n",
    "### GRADED PART (DO NOT EDIT) ###"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Submission successful, please check on the coursera grader page for the status\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[0.06230677292748971,\n",
       " 0.061198079555789694,\n",
       " 0.0608126495018327,\n",
       " 0.059183678488410246,\n",
       " 0.05949007814616248,\n",
       " 0.05725268125796696,\n",
       " 0.05935452158486421,\n",
       " 0.05831142832530561,\n",
       " 0.05928563431624706,\n",
       " 0.059076129472239725,\n",
       " 0.05762985490169308,\n",
       " 0.059174927565307574,\n",
       " 0.0581599059610326]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filename: DistancesVariance, PART jD7SY\n",
    "### GRADED PART (DO NOT EDIT) ###\n",
    "part_4 = list(euclidean_distances_variance_vector)\n",
    "try:\n",
    "    part4 = \" \".join(map(repr, part_4))\n",
    "except TypeError:\n",
    "    part4 = repr(part_4)\n",
    "submissions[all_parts[3]]=part4\n",
    "grading.submit(COURSERA_EMAIL, COURSERA_TOKEN, assignment_key,all_parts[:4],all_parts,submissions)\n",
    "euclidean_distances_variance_vector\n",
    "### GRADED PART (DO NOT EDIT) ###"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Submission successful, please check on the coursera grader page for the status\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[0.1988768646152347,\n",
       " -0.021074633737255224,\n",
       " -0.05749817620192312,\n",
       " -0.0718962153911559,\n",
       " -0.006116609407693513,\n",
       " -0.023983251393225696,\n",
       " -0.05204557015527253,\n",
       " -0.018424595473803283,\n",
       " -0.0040378906739251515,\n",
       " -0.020853349346522568,\n",
       " -0.014025628984910854,\n",
       " 0.029458241353260126,\n",
       " -0.04396638054084743]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filename:  DistancesSkewness, PART: Ad4J0\n",
    "### GRADED PART (DO NOT EDIT) ###\n",
    "part_5 = list(euclidean_distances_skewness_vector)\n",
    "try:\n",
    "    part5 = \" \".join(map(repr, part_5))\n",
    "except TypeError:\n",
    "    part5 = repr(part_5)\n",
    "submissions[all_parts[4]]=part5\n",
    "grading.submit(COURSERA_EMAIL, COURSERA_TOKEN, assignment_key,all_parts[:5],all_parts,submissions)\n",
    "euclidean_distances_skewness_vector\n",
    "### GRADED PART (DO NOT EDIT) ###"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Submission successful, please check on the coursera grader page for the status\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[-0.6384013133225133,\n",
       " -0.2758439734602782,\n",
       " -0.15223233078033216,\n",
       " -0.07988375526844216,\n",
       " -0.01044769148587088,\n",
       " -0.08064860279897701,\n",
       " -0.02331335574782667,\n",
       " -0.020166667252636383,\n",
       " 0.10669665209383927,\n",
       " -0.0536906631006282,\n",
       " 0.024930971487188813,\n",
       " 0.003075352050577962,\n",
       " 0.06775391815498777]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Filename: DistancesKurtosis, PART: 1nPFm\n",
    "### GRADED PART (DO NOT EDIT) ###\n",
    "part_6 = list(euclidean_distances_kurtosis_vector)\n",
    "try:\n",
    "    part6 = \" \".join(map(repr, part_6))\n",
    "except TypeError:\n",
    "    part6 = repr(part_6)\n",
    "submissions[all_parts[5]]=part6\n",
    "grading.submit(COURSERA_EMAIL, COURSERA_TOKEN, assignment_key,all_parts[:6],all_parts,submissions)\n",
    "euclidean_distances_kurtosis_vector\n",
    "### GRADED PART (DO NOT EDIT) ###"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# here we plot the stats for different sample sizes\n",
    "try:\n",
    "    plt.figure()\n",
    "    plt.plot( dimension_vector, euclidean_distances_mean_vector, 'r-', marker='o' )\n",
    "    plt.grid( True, which='both')\n",
    "    plt.minorticks_on()\n",
    "    plt.title( 'Mean of Euclidean Distances Distribution' )\n",
    "    plt.xlabel( 'Dimension' )\n",
    "    plt.ylabel( 'Mean of Euclidean Distances' )\n",
    "\n",
    "    plt.figure()\n",
    "    plt.plot( dimension_vector, euclidean_distances_variance_vector, 'r-', marker='o' )\n",
    "    plt.grid( True, which='both')\n",
    "    plt.minorticks_on()\n",
    "    plt.title( 'Variance of Euclidean Distances Distribution' )\n",
    "    plt.xlabel( 'Dimension' )\n",
    "    plt.ylabel( 'Variance of Euclidean Distances' )\n",
    "\n",
    "    plt.figure()\n",
    "    plt.plot( dimension_vector, euclidean_distances_skewness_vector, 'r-', marker='o' )\n",
    "    plt.grid( True, which='both')\n",
    "    plt.minorticks_on()\n",
    "    plt.title( 'Skewness of Euclidean Distances Distribution' )\n",
    "    plt.xlabel( 'Dimension' )\n",
    "    plt.ylabel( 'Skewness of Euclidean Distances' )\n",
    "\n",
    "    plt.figure()\n",
    "    plt.plot( dimension_vector, euclidean_distances_kurtosis_vector, 'r-', marker='o' )\n",
    "    plt.grid( True, which='both')\n",
    "    plt.minorticks_on()\n",
    "    plt.title( 'Kurtosis of Euclidean Distances Distribution' )\n",
    "    plt.xlabel( 'Dimension' )\n",
    "    plt.ylabel( 'Kurtosis of Euclidean Distances' )\n",
    "\n",
    "    matplotlib.pyplot.show()\n",
    "except: pass"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "coursera": {
   "course_slug": "guided-tour-machine-learning-finance",
   "graded_item_id": "qoIPX",
   "launcher_item_id": "rsGVU"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
